CHASM: Static Analysis and Automatic Code Generation for Improved Fortran 90 and C++ Interoperability

نویسنده

  • C. E. Rasmussen
چکیده

The relative simplicity and design of the Fortran 77 language allowed for reasonable interoperability with C and C++. Fortran 90, on the other hand, introduces several new and complex features to the language that severely degrade the ability of a mixed Fortran and C++ development environment. Major new items added to Fortran are userdefined types, pointers, and several new array features. Each of these items introduce difficulties because the Fortran 90 procedure calling convention was not designed with interoperability as an important design goal. For example, Fortran 90 arrays are passed by array descriptor, which is not specified by the language and therefore depends on a particular compiler implementation. This paper describes a set of software tools that parses Fortran 90 source code and produces mediating interface functions which allow access to Fortran 90 libraries from C++. 1.0 Introduction Fortran is usually an integral part of the computing environment at a scientific institution like Los Alamos National Laboratory (LANL). At LANL, there are many legacy Fortran applications and libraries, as well as scientific programmers who use Fortran as a language of choice. Legacy Fortran applications have frequently been written in Fortran 77 using static arrays as the dominant data structure, e.g. [1]. These static arrays were well suited for the time as they easily mapped onto the fixed grid models of the legacy codes. However, as scientific models have increased in complexity, it has been found that Fortran 77 is not expressive enough to provide the language support that many of the new models require. For instance, the trend has increasingly been for computational fluid dynamic (CFD) models to require high resolution in certain regions of interest (to resolve the physics), but in other modeled regions, physical parameters may vary slowly enough that a courser grid is appropriate, e.g. [2]. Thus, the tendency has been for scientific models to move from static fixed grids to particle and fluid meshes that adapt to flow and other dynamic, model parameters. Fortran 90 provides many new features that are attractive to Fortran 77 programmers [3]. The most important of these include data structures and user-defined types, many new array features and operations, pointers, increased support for code modularization, and improved type safety. The new array features include the ability to process entire arrays as a single object and to dynamically allocate (and deallocate) arrays. While the rank (i.e., the number of dimensions) of an array must be specified at compile time, the shape (i.e., upper and lower bounds in each dimension) may be deferred until runtime. These new features in Fortran 90 allow the increased complexity of the newer scientific models to be more easily and naturally expressed. At the same time as scientific programmers have been migrating to Fortran 90, many have also begun to use C++. In addition, younger scientific programmers frequently use C++ because of the dominance of C and C++ in the educational and commercial environments. This has led to a mixed programming environment of Fortran 90 and C++. While Fortran 77 and C are largely able to coexist in a mixed programming environment (for example, libraries such as MPI frequently have both a Fortran and a C interface [4, 5]), this is not the case with Fortran 90 and C++. One of the main problems arises when one wishes to share arrays and user-defined types between Fortran 90 and C++ modules. Arrays and pointers to an array, for example, are passed via a descriptor to explicit procedure interfaces. Unfortunately, there is no Fortran standard that describes the layout of an array descriptor. Thus, while a pointer to the memory address of a Fortran 90 array may be passed to C++ (via an implicit interface), there is no standard way for C++ to call a Fortran 90 procedure that expects an array or a Fortran pointer to an array. In essence, there is no way to cast a memory address to an array in Fortran. As stated by Adams et al. [3], “currently, the only way to guarantee consistent interfaces across implementations is to write all procedures in standard Fortran.” This paper addresses this situation by describing CHASM, a set of software tools and methods that use static code analysis to generate adapter functions that bridge the divide between C++ and Fortran 90 in a language-conforming way. The general technique is to first parse Fortran source code to discover a list of procedures and their interfaces, including the number of arguments and their types. The Program Database Toolkit (PDT) [6] is used to accomplish this task. Then for each procedure in this list, an adapter procedure is automatically generated that allows C++ to call the desired Fortran procedure. Within this adapter function, argument type conversion is performed if necessary and the associated Fortran 90 procedure is called (with return values also converted as necessary). The primary focus of this paper is to describe the adapter functions that are used as a bridge in CHASM to call Fortran 90 procedures from C++. The emphasis will be on the passing of arrays as parameters to Fortran procedures. Similar techniques could be used to generate adapter functions to allow Fortran 90 to call C++ or to share user-defined types between the two languages, but this is outside the focus of this paper and will be addressed later. The paper proceeds as follows. First, the PDT and associated tools are described that allow for the static analysis of Fortran 90 source code. Then, the general design of the adapter functions is covered, with particular attention given to the passing of array type parameters. Finally, a C++ array class is described, that is used to wrap Fortran 90 arrays so they can be shared with the C++ environment. 2.0 Program Database Toolkit The Program Database Toolkit (PDT) [6] provides a framework in which application developers can access programming language constructs and modify and generate object-oriented software. As such, it is an extremely useful tool and can be used in a variety of different settings. Two existing libraries that make use of the PDT are TAU (Tuning and Analysis Utilities) [7] and SILOON (Scripting Interface Languages for Object-Oriented Numerics) [8]. TAU uses the PDT to gather information about the location of function entry and exit points, and uses this information to insert calls to profiling and tracing routines so that existing source code can be automatically instrumented for performance behavior. SILOON automatically generates bridging code to provide users with the ability to program in a high-level scripting language (such as Python and Perl) while accessing existing C++ libraries. SILOON uses the PDT to gather information about methods and functions in C++ source code (including the number and type of function parameters), and uses this information to generate bridging code so that these functions can be called from a scripting language. The PDT framework works by first parsing source code (utilizing commercial compiler front ends) and then processing the resulting intermediate language (IL) trees. As it traverses an IL tree, a Fortran 90 (or C++) IL analyzer extracts information on high-level constructs and source code locations that is needed by analysis tools and code-generating applications. This information is then made available in a human-readable, “program database” (PDB) file. The DUCTAPE library [6] is then used to access the contents of the PDB file. The second release of PDT extends support to Fortran 90 and enhances the handling of C++ templates and template instantiations. The IL analyzers process IL trees that are constructed during the parsing of source code. Based on the Mutek Fortran 90 [9] and Edison Design Group (EDG) C++ v2.45 [10] front ends, respectively, the Fortran 90 and C++ IL analyzers work similarly but operate on language-specific constructs. Both traverse IL trees, reporting information on designated, high-level constructs as they are encountered. Separate traversals for source files, routines, user-defined types, other types, and other entities allow restructuring of the reported information. The output of the IL analyzers is a “program database” (PDB) file consisting of descriptions for various language entities. Each description identifies an item and lists its features. Information on source files, routines, Fortran 90 user-defined types and modules, C++ classes, structs and unions, other types, as well as templates, namespaces, and macros, is contained in a PDB file. A number of changes in the database format (from PDT version 1.0) were necessary to accommodate the addition of the Fortran 90 IL analyzer. DUCTAPE is a C++ library that provides a common, object-oriented API to the PDB files produced by both IL analyzers. Each item type of the PDB format is represented by a class having a corresponding name (e.g., the pdbRoutine class describes routines). Information about the PDB items is accessible through member functions of the DUCTAPE classes. Common attributes are factored out into generic base classes. A single class hierarchy accommodates both Fortran 90 and C++. With the DUCTAPE library, PDT provides some useful static analysis tools. These include: • pdbconv converts PDB files to a more readable format; • pdbhtml creates web-based documentation; • pdbmerge merges PDB files from separate compilations; and • pdbtree displays file inclusion, class hierarchy, and call graph trees. Figure 1. A schematic diagram of the static analysis of an application library by the PDT. Source code is parsed and the resulting intermediate language is processed and written to a PDB file that can be read using DUCTAPE, as shown in the left hand side of the figure. Examples of existing applications making use of the final output are shown in the right hand side of the figure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bridging the language gap in scientific computing: the Chasm approach

Chasm is a toolkit providing seamless language interoperability between Fortran 95 and C++. Language interoperability is important to scientific programmers because scientific applications are predominantly written in Fortran, while software tools are mostly written in C++. Two design features differentiate Chasm from other related tools. First, we avoid the common-denominator type systems and ...

متن کامل

FINGER: A Symbolic System for Automatic Generation of Numerical Programs in Finite Element Analysis

FINGER iS a Lisp-based system to derive formulas needed in finite element analysis, and to generate FORTRAN code from these formulas. The generated programs can be used with existing, FORTRAN-based finite element analysis packages. This approach aims to replace tedious hand computations that are time consuming and error prone. The design and implementation of FINGER are presented. Techniques fo...

متن کامل

Experiences with Porting the Parallel Cfd Code N3s to Fortran 90

We first present the difficulties we have overcome using the CF90 parallelizer-compiler to obtain a parallel version of the N3S code which is similar to that obtained using the CF77 parallelizer-compiler. We also discuss about the stability of the results produced after this porting. We finally describe the difficulties encountered in controlling the memory consumption when moving from the Fort...

متن کامل

Automatic parallelization of vector parallel codes for preconditioned iterative solvers

There is a lot of effort to make programming for HPC more productive and we are to make our contribution. After gaining some experience in programming preconditioned iterative solvers in Fortran and MPI we propose new approach, that is based on the mixed ideas from vector parallel languages and parallelizing compilers like HPF. We follow two rules, first try to vectorize our code as much as pos...

متن کامل

HPCM: A Pre-Compiler Aided Middleware for the Mobility of Legacy Code

Mobility is a fundamental functionality of the next generation internet computing. How to support mobility for legacy codes, however, is still an issue of research. The key to solve this outstanding issue is the support of heterogeneous process migration. During the last few years, we have successfully developed mechanisms to support heterogeneous process migration of legacy codes written in C,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001